Skip to main content

Internet Archive Web Crawls

The Internet Archive discovers and captures web pages through many different web crawls.



rss RSS

Show sorted alphabetically
Show sorted alphabetically
SHOW DETAILS
up-solid down-solid
Prior Page
eye
Title
Date Archived
Creator
2004 Election
2004 Election
collection
178
ITEMS
11.7M
VIEWS
collection
eye 11.7M
2004 Election crawl performed by Internet Archive. This data is currently not publicly accessible.
Live Web Proxy Crawls
web
eye 683,418
favorite 0
comment 0
YouTube 2007 Crawl
YouTube 2007 Crawl
collection
73,847
ITEMS
10.9M
VIEWS
collection
eye 10.9M
Data crawled from YouTube.com in 2007 by Internet Archive. These files are not currently accessible.
Live Web Proxy Crawls
web
eye 1.9M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 9.2M
favorite 0
comment 0
World Wars Crawl
World Wars Crawl
collection
13
ITEMS
12.3M
VIEWS
collection
eye 12.3M
Web data related to World Wars I and II collected by Internet Archive in an experimental crawl sponsored by National Endowment for the Humanities and JISC. This data is currently not publicly accessible.
Live Web Proxy Crawls
web
eye 7.9M
favorite 0
comment 0
Survey Crawl Number 8
web
eye 586,669
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl339.us.archive.org:survey from Tue Jan 22 09:37:30 PST 2019 to Tue Jan 22 03:52:56 PST 2019.
Topic: crawldata
Survey Crawl Number 8
web
eye 583,119
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl344.us.archive.org:survey from Thu Jan 10 16:19:12 PST 2019 to Thu Jan 10 14:12:18 PST 2019.
Topic: crawldata
Survey Crawl Number 8
web
eye 588,868
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl818.us.archive.org:survey from Sat Jan 12 11:20:21 PST 2019 to Sat Jan 12 08:14:51 PST 2019.
Topic: crawldata
End Of Term 2016 Library of Congress Crawls
End Of Term 2016 Library of Congress Crawls
collection
3,892
ITEMS
5M
VIEWS
collection
eye 5M
End of Term 2016 Web Archive government web crawls by project partner the Library of Congress.
Topics: end of term, federal government, 2016, president, congress, library of congress, web, data, library...
Survey Crawl Number 8
web
eye 541,389
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl835.us.archive.org:survey from Wed Jan 16 13:02:07 PST 2019 to Wed Jan 16 13:30:45 PST 2019.
Topic: crawldata
Live Web Proxy Crawls
web
eye 17.9M
favorite 7
comment 1
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Sun Mar 27 22:10:09 PDT 2011 to Mon Mar 28 05:27:05 PDT 2011.
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
Live Web Proxy Crawls
web
eye 2M
favorite 1
comment 0
Live Web Proxy Crawls
web
eye 2M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 2.2M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 2M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 2M
favorite 0
comment 0
Shallow Crawl Started February 2013
Shallow Crawl Started February 2013
collection
86
ITEMS
15.8M
VIEWS
collection
eye 15.8M
Shallow crawl started February 2013 that collects content 1 level deep, including embeds. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl344.us.archive.org:survey from Thu Oct 12 08:48:34 PDT 2017 to Thu Oct 12 01:56:31 PDT 2017.
Topic: crawldata
Live Web Proxy Crawls
web
eye 506,472
favorite 0
comment 0
End of Term 2008 UNT Dot Gov Crawl
End of Term 2008 UNT Dot Gov Crawl
collection
282
ITEMS
9.2M
VIEWS
collection
eye 9.2M
End of term 2008 crawl of .gov domains gathered by University of North Texas . This data is currently not publicly accessible. UNT is a student-focused, public, research university located in Denton, Texas.
Hurricane Katrina
Hurricane Katrina
collection
112
ITEMS
8.9M
VIEWS
collection
eye 8.9M
Data related to Hurricane Katrina collected in 2005 by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Hurricane Katrina was the deadliest and most destructive Atlantic hurricane of the 2005 Atlantic hurricane season. It was the costliest natural disaster, as well as one of the five deadliest hurricanes, in the history of the United States. Among recorded Atlantic hurricanes, it was the sixth strongest overall. At least 1,833 people died in the hurricane and...
Data crawled by National Endowment for the Humanities and JISC on behalf of Internet Archive from Fri Aug 08 00:17:40 PDT 2008 to Thu Jun 26 05:29:33 PDT 2008
Topic: crawldata
Live Web Proxy Crawls
web
eye 437,632
favorite 0
comment 0
Shallow Crawl Started November 2012
Shallow Crawl Started November 2012
collection
137
ITEMS
9.4M
VIEWS
collection
eye 9.4M
Shallow crawl started November 2012 that collects content 1 level deep, including embeds. This data is currently not publicly accessible.
Live Web Proxy Crawls
web
eye 1.7M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 530,984
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 484,296
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 1.5M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 1.7M
favorite 0
comment 0
Wide Crawl started February 2014
web
eye 6.8M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Feb 19 07:58:38 PST 2014 to Wed Feb 19 05:13:46 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 7M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl453.us.archive.org:wide from Wed Feb 19 01:09:37 PST 2014 to Tue Feb 18 21:33:27 PST 2014.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Fri Nov 09 14:00:21 PDT 2007 to Sun Feb 18 00:14:41 PDT 2007
Topic: crawldata
Wide Crawl started February 2014
web
eye 7M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Tue Feb 18 17:01:58 PST 2014 to Tue Feb 18 13:14:06 PST 2014.
Topic: crawldata
Live Web Proxy Crawls
web
eye 6.9M
favorite 0
comment 0
Wide Crawl started February 2014
web
eye 6.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Wed Feb 19 09:49:01 PST 2014 to Wed Feb 19 06:07:15 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 6.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl454.us.archive.org:wide from Wed Feb 19 05:20:19 PST 2014 to Wed Feb 19 01:54:33 PST 2014.
Topic: crawldata
Survey Crawl Number 6: Sep 11th, 2017 - running now
web
eye 271,712
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl344.us.archive.org:survey from Sat Sep 30 10:02:40 PDT 2017 to Sat Sep 30 03:34:44 PDT 2017.
Topic: crawldata
Live Web Proxy Crawls
web
eye 163,073
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 924,201
favorite 0
comment 0
Wide Crawl started October 2011
web
eye 936,454
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Fri Oct 7 04:29:15 PDT 2011 to Fri Oct 7 00:11:17 PDT 2011.
Topic: crawldata
UNT Web
UNT Web
collection
35
ITEMS
5.5M
VIEWS
collection
eye 5.5M
This collection contains all collaborative crawl data contributed by University of North Texas (UNT).
Topics: UNT, web, texas, eot
Data crawled by Internet Archive on behalf of Internet Archive from Mon Feb 14 02:02:18 PDT 2005 to Sun Aug 14 14:02:49 PDT 2005
Topic: crawldata
Wayback Robots Crawl
Wayback Robots Crawl
collection
129
ITEMS
6.4M
VIEWS
collection
eye 6.4M
Wayback robots.txt crawl performed by Internet Archive. This data is currently not publicly accessible.
Wide Crawl started October 2011
web
eye 806,001
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Fri Oct 7 14:36:17 PDT 2011 to Fri Oct 7 08:44:43 PDT 2011.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Mon Mar 14 23:31:25 PDT 2005 to Tue Nov 01 20:02:03 PDT 2005
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.5M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 909,053
favorite 0
comment 0
2004 Indian Ocean earthquake and tsunami
2004 Indian Ocean earthquake and tsunami
collection
42
ITEMS
5.6M
VIEWS
collection
eye 5.6M
Data related to the 2004 Indian Ocean earthquake and tsunami collected by Internet Archive. This data is currently not publicly accessible.
Data crawled by Internet Archive on behalf of Internet Archive from Tue Apr 19 08:34:16 PDT 2005 to Wed Oct 19 19:17:43 PDT 2005
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.9M
favorite 0
comment 0
Top 150 Crawl
Top 150 Crawl
collection
30
ITEMS
5.6M
VIEWS
collection
eye 5.6M
Top 150 Alexa sites crawl performed by Internet Archive. This data is currently not publicly accessible.
Live Web Proxy Crawls
web
eye 6.2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Mon Mar 28 12:43:47 PDT 2011 to Mon Mar 28 16:56:17 PDT 2011.
Topic: crawldata
Survey Crawl Number 8
web
eye 721,705
favorite 1
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl838.us.archive.org:survey from Fri Jan 18 06:42:06 PST 2019 to Fri Jan 18 00:46:03 PST 2019.
Topic: crawldata
Survey Crawl Number 8
web
eye 721,356
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl835.us.archive.org:survey from Tue Jan 29 00:07:09 PST 2019 to Mon Jan 28 17:38:10 PST 2019.
Topic: crawldata
Live Web Proxy Crawls
web
eye 2.3M
favorite 0
comment 0
Data crawled by Internet Archive on behalf of Internet Archive from Fri Apr 13 20:41:40 PDT 2007 to Tue Sep 11 22:21:27 PDT 2007
Topic: crawldata
Live Web Proxy Crawls
web
eye 621,536
favorite 0
comment 0
Survey Crawl Number 8
web
eye 994,599
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl818.us.archive.org:survey from Tue Jan 8 08:38:27 PST 2019 to Tue Jan 8 07:59:00 PST 2019.
Topic: crawldata
Live Web Proxy Crawls
web
eye 308,032
favorite 0
comment 0
Shallow Crawl Started 2013
Shallow Crawl Started 2013
collection
18
ITEMS
3.5M
VIEWS
collection
eye 3.5M
Shallow crawl started 2013 that collects content 1 level deep, including embeds. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
UK Government Site Crawl
UK Government Site Crawl
collection
107
ITEMS
5.2M
VIEWS
collection
eye 5.2M
Collaborative closure crawl of British government sites performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : GOV.UK is a United Kingdom public sector information website, created by the Government Digital Service to provide a single point of access to HM Government services.
Data crawled by Internet Archive on behalf of Internet Archive from Fri Nov 01 06:23:33 PDT 2002 to Tue Nov 19 23:24:07 PDT 2002
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Tue Jun 13 00:55:34 PDT 2017 to Mon Jun 12 19:36:27 PDT 2017.
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.5M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 478,871
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 1.6M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 4.4M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Wed Mar 30 03:11:49 PDT 2011 to Wed Mar 30 10:24:04 PDT 2011.
Topic: crawldata
Live Web Proxy Crawls
web
eye 81,398
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Mon Feb 12 21:42:38 PST 2018 to Mon Feb 12 15:20:34 PST 2018.
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.3M
favorite 0
comment 0
Survey Crawl Number 5: Oct 21st, 2016 to Sep 10th, 2017
web
eye 133,289
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl817.us.archive.org:survey from Fri Jul 14 06:02:55 PDT 2017 to Fri Jul 14 02:22:22 PDT 2017.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Fri Sep 26 12:13:03 PDT 2003 to Wed Dec 10 23:25:51 PDT 2003
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sat Sep 18 12:46:38 PDT 2004 to Thu May 05 09:34:36 PDT 2005
Topic: crawldata